AITopics

2410.18444

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

arXiv.org Artificial IntelligenceMay-15-2024

Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

Jin, Weifei, Cao, Yuxin, Su, Junjie, Shen, Qi, Ye, Kai, Wang, Derui, Hao, Jie, Liu, Ziyao

In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study.

adversarial example, perturbation, style transfer, (14 more...)

2405.0947

Country:

Asia > Singapore (0.05)
Asia > China > Beijing > Beijing (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ghimire, Rupak Raj, Bal, Bal Krishna, Poudyal, Prakash

A Comprehensive Study of the Current State-of-the-Art in Nepali Automatic Speech Recognition Systems

arXiv.org Artificial IntelligenceFeb-5-2024

In this paper, we examine the research conducted in the field of Nepali Automatic Speech Recognition (ASR). The primary objective of this survey is to conduct a comprehensive review of the works on Nepali Automatic Speech Recognition Systems completed to date, explore the different datasets used, examine the technology utilized, and take account of the obstacles encountered in implementing the Nepali ASR system. In tandem with the global trends of ever-increasing research on speech recognition based research, the number of Nepalese ASR-related projects are also growing. Nevertheless, the investigation of language and acoustic models of the Nepali language has not received adequate attention compared to languages that possess ample resources. In this context, we provide a framework as well as directions for future investigations.

dataset, recognition, speech recognition, (12 more...)

2402.0305

Country:

Asia > Singapore (0.05)
Asia > India (0.05)
North America > United States > New York (0.04)
(5 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-27-2023

Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring

Sudarshan, Ankitha, Samuel, Vinay, Patwa, Parth, Amara, Ibtihel, Chadha, Aman

Automatic Speech Recognition (ASR) has witnessed a profound research interest. Recent breakthroughs have given ASR systems different prospects such as faithfully transcribing spoken language, which is a pivotal advancement in building conversational agents. However, there is still an imminent challenge of accurately discerning context-dependent words and phrases. In this work, we propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing leveraging the power of deep learning models in accurately delivering spot-on transcriptions across a wide variety of vocabularies and speaking styles. Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models integrating both language and acoustic modeling for better accuracy. We infused our network with the use of a transformer-based model to properly rescore the word lattice achieving remarkable capabilities with a palpable reduction in Word Error Rate (WER). We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.

lattice, recognition, speech recognition, (17 more...)

2310.0968

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

#artificialintelligenceOct-19-2022, 10:56:09 GMT

Focus on Whisper, OpenAI's automatic speech recognition system - Actu IA

OpenAI recently released Whisper, a 1.6 billion parameter AI model capable of transcribing and translating speech audio from 97 different languages, showing robust performance on a wide range of automated speech recognition (ASR) tasks. The model trained on 680,000 hours of audio data collected from the web was soon published as open source on GitHub. Whisper uses a transform-encoder-decoder architecture, the input audio is split into 30-second chunks, converted to a log-Mel spectrogram, and then passed through an encoder. Unlike most state-of-the-art ASR models, it has not been fitted to a specific data set, but instead has been trained using weak supervision on a large-scale noisy data set collected from the Internet. Although it did not beat the specialized LibriSpeech performance models, in zero-shot evaluations on a diverse dataset, Whisper proved to be more robust and made 50% fewer errors than those models.

automatic speech recognition system, openai, translation, (8 more...)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Safonova, Anastasia, Yudina, Tatiana, Nadimanov, Emil, Davenport, Cydnie

Automatic Speech Recognition of Low-Resource Languages Based on Chukchi

arXiv.org Artificial IntelligenceOct-11-2022

The following paper presents a project focused on the research and creation of a new Automatic Speech Recognition (ASR) based in the Chukchi language. There is no one complete corpus of the Chukchi language, so most of the work consisted in collecting audio and texts in the Chukchi language from open sources and processing them. We managed to collect 21:34:23 hours of audio recordings and 112,719 sentences (or 2,068,273 words) of text in the Chukchi language. The XLSR model was trained on the obtained data, which showed good results even with a small amount of data. Besides the fact that the Chukchi language is a low-resource language, it is also polysynthetic, which significantly complicates any automatic processing. Thus, the usual WER metric for evaluating ASR becomes less indicative for a polysynthetic language. However, the CER metric showed good results. The question of metrics for polysynthetic languages remains open.

artificial intelligence, chukchi, machine learning, (12 more...)

2210.05726

Country:

North America > United States (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
Asia > Russia > Far Eastern Federal District > Chukotka Autonomous Okrug > Anadyr (0.04)

Genre: Research Report (0.64)

Industry: Media > Music (0.35)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceSep-19-2022

An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning

Showrav, Tushar Talukder

An independent, automated method of decoding and transcribing oral speech is known as automatic speech recognition (ASR). A typical ASR system extracts feature from audio recordings or streams and run one or more algorithms to map the features to corresponding texts. Numerous of research has been done in the field of speech signal processing in recent years. When given adequate resources, both conventional ASR and emerging end-to-end (E2E) speech recognition have produced promising results. However, for low-resource languages like Bengali, the current state of ASR lags behind, although the low resource state does not reflect upon the fact that this language is spoken by over 500 million people all over the world. Despite its popularity, there aren't many diverse open-source datasets available, which makes it difficult to conduct research on Bengali speech recognition systems. This paper is a part of the competition named `BUET CSE Fest DL Sprint'. The purpose of this paper is to improve the speech recognition performance of the Bengali language by adopting speech recognition technology on the E2E structure based on the transfer learning framework. The proposed method effectively models the Bengali language and achieves 3.819 score in `Levenshtein Mean Distance' on the test dataset of 7747 samples, when only 1000 samples of train dataset were used to train.

artificial intelligence, dataset, speech recognition, (12 more...)

2209.08119

Country: Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre: Research Report (0.50)

Industry: Media (0.34)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

#artificialintelligenceJun-8-2022, 01:25:14 GMT

How To Fool an Eavesdropping AI … With Another AI

Scientists at Columbia University in New York City think they've devised an AI that can effectively fool an eavesdropping automatic speech recognition system from transcribing your private conversation. So in the future, you may not have to worry that someone is using spyware to record your phone calls, or that your Alexa is listening in when it shouldn't be. Their Neural Voice Camouflage system prevents eavesdroppers from secretly transcribing your audio conversation by piggybacking a custom static-type noise over your speech. The noise is set to the same volume as normal background noise--no louder than a regular background air conditioning unit--so people you're talking to can still easily make out what you're saying. However, the automatic speech recognition system (ASR) that's attempting to eavesdrop will get confused and produce a Gobbledygook transcription, as you can see in the demonstration below: This process of producing a custom background noise is more complicated than it seems.

background noise, noise, white noise, (8 more...)

Country: North America > United States > New York (0.25)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

#artificialintelligenceApr-23-2022, 14:35:52 GMT

Stopping Smart Devices From Spying on You - Neuroscience News

Summary: Researchers have developed a new AI algorithm that prevents smart devices such as Alexa or Siri from correctly hearing your words 80% of the time. The algorithm is a step toward providing personal agency in protecting the privacy of their voice in the presence of smart devices. Ever noticed online ads following you that are eerily close to something you've recently talked about with your friends and family? Microphones are embedded into nearly everything today, from our phones, watches, and televisions to voice assistants, and they are always listening to you. Computers are constantly using neural networks and AI to process your speech, in order to gain information about you.

algorithm, speech, stopping smart device, (12 more...)

Country: North America > United States > Pennsylvania (0.05)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.75)
Information Technology > Artificial Intelligence > Machine Learning (0.52)
Information Technology > Artificial Intelligence > Cognitive Science (0.50)

#artificialintelligenceApr-19-2022, 02:40:06 GMT

Stopping 'them' from spying on you: New AI can block rogue microphones

Ever noticed online ads following you that are eerily close to something you've recently talked about with your friends and family? Microphones are embedded into nearly everything today, from our phones, watches, and televisions to voice assistants, and they are always listening to you. Computers are constantly using neural networks and AI to process your speech, in order to gain information about you. If you wanted to prevent this from happening, how could you go about it? Back in the day, as portrayed in the hit TV show "The Americans," you would play music with the volume way up or turn on the water in the bathroom.

block rogue microphone, microphone, rogue microphone, (15 more...)

Country: North America > United States > Pennsylvania (0.05)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.76)
Information Technology > Artificial Intelligence > Machine Learning (0.52)